Result Page Generation for Web Searching by Mostafa Alli;

Result Page Generation for Web Searching by Mostafa Alli;

Author:Mostafa Alli;
Language: eng
Format: epub, pdf
Publisher: IGI Global


An earlier study (Teufel, S. and Moens, M 2002) showed that summarizing scientific papers needs a specific summarization method and other existing methods may not be a good fit. We showed in section [] that why an extractive summarization is a good fit for such aim. In this section, we ran an empirical study against Google scholar in terms of its ranking policy when it is highly based on their citation scores versus when they are ranked based on their textual similarities.

EMPIRICAL STUDY

In this study, we investigated ranking behavior of Google Scholar based on citation scores (by default) and based on textual similarity of papers. To produce similarity of papers, we extracted textual content of papers and applied stop-word removal to clean the plain text. We then counted the most frequent keyword of this bag of words and extracted textual portion of a paper within the occurrence of this most frequent keyword as a normal version of summary of a paper. In addition, we applied stemming after applying stop-word removal to group similar words under a root word as a family word. Afterwards, we counted frequency of words and selected most frequent keyword of a paper and produced summary of a paper based on same procedure for normal keywords. We used these two different versions of summaries in case if they result into different behavior.

Evaluation Metric

For the aim of this study, we evaluated Google scholar’s ranking behavior based on normality and regression curve estimation tests.

Procedure

In order to evaluate ranking policy of Google scholar and observe effect of our proposal, we decided to select 8 random papers as input for Google scholar. Afterwards, we stored papers that Google scholar returned for each input paper for first 4 pages. We also stored citation scores of corresponding papers and their ranks in the Google scholar listing. On the other hand, we produced summaries of same papers based on aforementioned policies for both normal and stemmed keywords. Accordingly, we re-ranked Google scholar listings based on these two types of similarity and analyzed them based on evaluation metrics.

Similarities are produced based on cosine similarity which works in vector space. Common formula for cosine similarity can be illustrated as following:

(1)

Where and are the two summarized version of two given papers. We used these similarities between candidate papers and input paper in order to generate a ranking list of similar papers.

Here, each of these vectors is representing a collection of words from each document that we want to make the similarity comparison.

Result

Accuracy and Distinctiveness of Similarity Values

In this section, we evaluated the similarity values produced based on our proposed mechanism. In first step, we applied sample t-test to find out if either similarity procedure produced significantly different sets of similarity values. According to Pvalue for all cases (0.0<Pvalue<0.03), we can conclude that similarity values produced by normal keywords are significantly different from those made by stemmed keywords.

For next step, we measured mean value of similarities of papers produced by either summary type. According to results, normal keywords would produce higher similarity values of papers (19.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.